Skip to content

Conversation

@robcasloz
Copy link
Contributor

@robcasloz robcasloz commented May 11, 2023

The canonicalization of MinI/MaxI chains into right-spline graphs within MinINode/MaxINode::Ideal() inhibits the vectorization of reductions of these nodes. This changeset reworks MinINode/MaxINode::Ideal() to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where c0 and c1 are constants and MAX is a compile-time operation:

  1. max(x + c0, max(x + c1, z)) (or a permutation of it) to max(x + MAX(c0, c1), z).
  2. max(x + c0, x + c1) (or a permutation of it) to x + MAX(c0, c1).

Here is an example of the four permutations handled in step 1 with x = RShiftI, c0 = 100 or 150, c1 = 150 or 100, and z = ConI (#int:200):

two-level-idealization

Here is an example of the two permutations handled in step 2 with x = RShiftI, c0 = 10 or 11, and c1 = 11 or 10:

one-level-idealization

The changeset implements MinINode/MaxINode::Ideal() in a common method MaxNode::IdealI(), since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized.

Testing

Functionality
  • tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode).
Performance
  • Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int

Reviewers

Contributors

  • Jatin Bhateja <jbhateja@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13924/head:pull/13924
$ git checkout pull/13924

Update a local copy of the PR:
$ git checkout pull/13924
$ git pull https://git.openjdk.org/jdk.git pull/13924/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13924

View PR using the GUI difftool:
$ git pr show -t 13924

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13924.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 11, 2023

👋 Welcome back rcastanedalo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 11, 2023

@robcasloz The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 11, 2023
@robcasloz
Copy link
Contributor Author

/contributor add @jbhateja

@openjdk
Copy link

openjdk bot commented May 11, 2023

@robcasloz
Contributor Jatin Bhateja <jbhateja@openjdk.org> successfully added.

@robcasloz robcasloz marked this pull request as ready for review May 12, 2023 07:52
@openjdk openjdk bot added the rfr Pull request is ready for review label May 12, 2023
@mlbridge
Copy link

mlbridge bot commented May 12, 2023

Webrevs

@eme64
Copy link
Contributor

eme64 commented May 17, 2023

Nice work with the tests, it's good to have some specific IR tests there!

I hope we can also generalize this for MaxL/MinL (once we do this JDK-8307513) - I think that is now also going to be easier with your refactoring towards MaxNode::IdealI.

return new MinINode(add_transformed, inner_other);
} else {
return new MaxINode(add_transformed, inner_other);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make use of MaxNode::build_min_max?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (extracting common functionality into build_min_max_int()), thanks!

Comment on lines 1126 to 1141
static ConstAddOperands as_add_with_constant(Node* n) {
ConstAddOperands null(nullptr, 0);
if (n->Opcode() != Op_AddI) {
return null;
}
Node* x = n->in(1);
Node* c = n->in(2);
if (!c->is_Con()) {
return null;
}
const Type* c_type = c->bottom_type();
if (c_type == Type::TOP) {
return null;
}
return ConstAddOperands(x, c_type->is_int()->get_con());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what it was on my last review:

// Return:
// <x, C>,       if n is of the form x + C, where 'C' is a non-TOP constant;
// <nullptr, _>, if n is of the form x + C, where 'C' is a TOP constant;
// <n, con>      otherwise.
static Node* constant_add_input(Node* n, jint* con) {
  if (n->Opcode() == Op_AddI && n->in(2)->is_Con()) {
    const Type* t = n->in(2)->bottom_type();
    if (t == Type::TOP) {
      return nullptr;
    }
    *con = t->is_int()->get_con();
    n = n->in(1);
  }
  return n;
}

Here, you used to also allow packing just a single n, and leave the constant as zero. Did you remove this possibility on purpose? Now n must be an AddI.

This used to allow cases like this to be folded:
max(max(a, b), a + 1) -> max(a + max(0, 1), b)

Or am I missing something? Do you have tests for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks Emanuel! The simplification of as_add_with_constant() was too aggressive, and the lost optimizations were not caught by any test or noticeable regression on the standard Java benchmark suites. I added more test cases to catch them now and reverted as_add_with_constant() to return <n, 0> for non-additions.

@eme64
Copy link
Contributor

eme64 commented May 31, 2023

@robcasloz it looks much better, thanks for refactoring :)
I have left a few more comments.

@robcasloz
Copy link
Contributor Author

@eme64 Thanks for your thorough review, I addressed your new comments and suggestions. Please let me know if there is anything else you would like to change.

}
Node* add_transformed = phase->transform(add_extracted);
Node* inner_other = inner_op->in(inner_add_index == 1 ? 2 : 1);
return build_min_max_int(add_transformed, inner_other, opcode == Op_MaxI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did something prevent you from directly using MaxNode::build_min_max?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I think that should be possible, but I find that extracting the core logic into a separate function is more readable (makes it straightforward to understand the effect of the call in MaxNode::IdealI()) and efficient (avoids a redundant application of PhaseGVN::transform() to the newly created MinI/MaxI nodes).

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks much cleaner, thanks for the new changes!
Thanks for the extra tests.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too.

@openjdk
Copy link

openjdk bot commented Jun 1, 2023

@robcasloz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int

Co-authored-by: Jatin Bhateja <jbhateja@openjdk.org>
Reviewed-by: epeter, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 17 new commits pushed to the master branch:

  • 6edd786: 8309265: Serial: Remove the code related to GC overheap limit
  • 61bb014: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V
  • 62c935d: 8308978: regression with a deadlock involving FollowReferences
  • aff9cea: 8303530: Redefine JAXP Configuration File
  • 1bb037b: 8309329: com/sun/jdi/DeferredStepTest.java fails with virtual threads due to not waiting for threads to exit
  • a23bbea: 8309241: ClassForNameLeak fails intermittently as the class loader hasn't been unloaded
  • 931913f: 8309200: java/net/httpclient/ExecutorShutdown fails intermittently, if connection closed during upgrade
  • dc21e8a: 8296411: AArch64: Accelerated Poly1305 intrinsics
  • 59d9d9f: 8303215: Make thread stacks not use huge pages
  • cb1e5e3: 8309286: G1: Remove unused G1HeapRegionAttr::is_valid_gen
  • ... and 7 more: https://git.openjdk.org/jdk/compare/7b0a33600e27507546d38c53bdbc482561e1154b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 1, 2023
@robcasloz
Copy link
Contributor Author

Thanks for reviewing, Emanuel and Vladimir! I noticed that commit 29922ea in this PR accidentally fixes a latent issue in MaxNode::build_min_max(). I have reported and fixed the issue separately in JDK-8309295. Will rebase this PR and submit a new version after re-testing.

@robcasloz
Copy link
Contributor Author

The latest version (v4) merges the fix from JDK-8309295 and prevents unnecessary idealization of MinI/MaxI nodes with transitive TOP constant inputs, which matches the original logic more closely. @eme64 @vnkozlov please re-review.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update is good.

Copy link
Contributor

@eme64 eme64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for all the refactoring!

@robcasloz
Copy link
Contributor Author

Emanuel, Vladimir, thanks again for reviewing, and particularly thanks Emanuel for the useful feedback!

@robcasloz
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 5, 2023

Going to push as commit 3fa776d.
Since your change was applied there have been 24 commits pushed to the master branch:

  • 22a9a86: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302
  • b6c9232: 8305225: A service broken error despite annotation processor generating it if directives listed
  • 05fb6c6: 8309336: Incorrect switch in enum not reported properly
  • 08c91c2: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes
  • ecb1753: 8309334: ProcessTools.main() does not properly set thread names when using the virtual thread wrapper
  • ac1597b: 8309409: Update HttpInputStreamTest and BodyProcessorInputStreamTest to use hg.openjdk.org
  • fdb5893: 8309391: Remove non-failing tests from test/jdk/ProblemList-Virtual.txt
  • 6edd786: 8309265: Serial: Remove the code related to GC overheap limit
  • 61bb014: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V
  • 62c935d: 8308978: regression with a deadlock involving FollowReferences
  • ... and 14 more: https://git.openjdk.org/jdk/compare/7b0a33600e27507546d38c53bdbc482561e1154b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 5, 2023
@openjdk openjdk bot closed this Jun 5, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 5, 2023
@openjdk
Copy link

openjdk bot commented Jun 5, 2023

@robcasloz Pushed as commit 3fa776d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants